Scalable long-term preservation of relational data through SPARQL queries

نویسندگان

  • Silvia Stefanova
  • Tore Risch
چکیده

We present an approach for scalable long-term archival of data stored in relational databases (RDBs) as RDF, implemented in the SAQ (Semantic Archive and Query) system. The proposed approach is suitable for archiving scientific data used in scientific publications where it is desirable to preserve only parts of an RDB, e.g. only data about a specific set of experimental artefacts in the database. With the approach, long-term preservation as RDF of selected parts of a database is specified as an archival query in an extended SPARQL dialect, A-SPARQL. The query processing is based on automatically generating an RDF view of a relational database to archive, called the RD-view. A-SPARQL provides flexible selection of data to be archived in terms of a SPARQL-like query to the RD-view. The result of an archival query is a data archive file containing the RDF-triples representing the relational data content to be preserved. The system also generates a schema archive file where sufficient meta-data are saved to allow the archived database to be fully reconstructed. An archival query usually selects both properties and their values for sets of subjects, which makes the property p in some triple patterns unknown. We call such queries where properties are unknown unbound-property queries. To achieve scalable data preservation and recreation, we propose some query transformation strategies suitable for optimizing unbound-property queries. These query rewriting strategies were implemented and evaluated in a new benchmark for archival queries called ABench. ABench is defined as set of typical A-SPARQL queries archiving selected parts of databases generated by the Berlin benchmark data generator. In experiments, the SAQ optimization strategies were evaluated by measuring the performance of A-SPARQL queries selecting triples for archival queries in ABench. The performance of equivalent SPARQL queries for related systems was also measured. The results showed that the proposed optimizations substantially improve the query execution time for archival queries.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimizing Unbound-property Queries to RDF Views of Relational Databases

SAQ (Semantic Archive and Query) is a system for querying and long-term preservation of relational data in terms of RDF. In SAQ relational data in a back-end DBMS is exposed as an RDF view, called the RD-view. SAQ can process arbitrary SPARQL queries to the RD-view. In addition long-term preservation as RDF of selected parts of a relational database is specified by SPARQL queries to the RD-view...

متن کامل

Scalable Preservation, Reconstruction, and Querying of Databases in terms of Semantic Web Representations

Stefanova, S. 2013. Scalable Preservation, Reconstruction, and Querying of Databases in terms of Semantic Web Representations. Acta Universitatis Upsaliensis. Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 1052. 59 pp. Uppsala. ISBN 978-91-554-8690-7. This Thesis addresses how Semantic Web representations, in particular RDF, can enable flexib...

متن کامل

Scalable Numerical SPARQL Queries over Relational Databases

We present an approach for scalable processing of SPARQL queries to RDF views of numerical data stored in relational databases (RDBs). Such queries include numerical expressions, inequalities, comparisons, etc. inside FILTERs. We call such FILTERs numerical expressions and the queries numerical SPARQL queries. For scalable execution of numerical SPARQL queries over RDBs, numerical operators sho...

متن کامل

Distributed Processing of Generalized Graph-Pattern Queries in SPARQL 1.1

We propose an efficient and scalable architecture for processing generalized graph-pattern queries as they are specified by the current W3C recommendation of the SPARQL 1.1 “Query Language” component. Specifically, the class of queries we consider consists of sets of SPARQL triple patterns with labeled property paths. From a relational perspective, this class resolves to conjunctive queries of ...

متن کامل

Generating of RDF graph from a relational database using Jena API

a great part of the existing data on the web is stored in relational databases (RDB). However, the transition from the traditional web to Semantic web requires new structuring of these data. In this context we propose a method which allows automatic extraction of data from RDB and their restructuring in the form of RDF graphs using the Jena API to make them available for the Semantic Web. This ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Semantic Web

دوره 7  شماره 

صفحات  -

تاریخ انتشار 2016